NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

Ford, James; Zhao, Xingmeng; Schumacher, Dan; Rios, Anthony (May 2025, Proceedings of the 31st International Conference on Computational Linguistics)
Rambow, Owen; Wanner, Leo; Apidianaki, Marianna; Khalifa, Hend; Eugenio, Barbara; Schockaert, Steven (Ed.)
We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable, or focus solely on data accuracy, neglecting the effectiveness of visual communication. By employing VQA models, we assess data representation quality and the general communicative clarity of charts. Experiments were conducted using two leading VQA benchmark datasets, ChartQA and PlotQA, with visualizations generated by OpenAI’s GPT-3.5 Turbo and Meta’s Llama 3.1 70B-Instruct models. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures. Moreover, while our results demonstrate that few-shot prompting significantly boosts the accuracy of chart generation, considerable progress remains to be made before LLMs can fully match the precision of human-generated graphs. This underscores the importance of our work, which expedites the research process by enabling rapid iteration without the need for human annotation, thus accelerating advancements in this field.
more » « less
Full Text Available
Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News

Zhao, Xingmeng; Schumacher, Dan; Nalluri, Sashank; Walton, Xavier; Shrestha, Suhana; Rios, Anthony (May 2025, Proceedings of the International AAAI Conference on Web and Social Media)

Increasing cycling for transportation or recreation can boost health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example, if news agencies overly report cycling accidents, it may make people perceive cyclists as "dangerous," reducing the number of cyclists who opt to cycle. Additionally, a decline in cycling can result in less government funding for safe infrastructure. In this paper, we develop a method for detecting the perceived perception of cyclists within news headlines. We introduce a new dataset called ``Bike Frames'' to accomplish this. The dataset consists of 31,480 news headlines and 1,500 annotations. Our focus is on analyzing 11,385 headlines from the United States. We also introduce the BikeFrame Chain-of-Code framework to predict cyclist perception, identify accident-related headlines, and determine fault. This framework uses pseudocode for precise logic and integrates news agency bias analysis for improved predictions over traditional chain-of-thought reasoning in large language models. Our method substantially outperforms other methods, and most importantly, we find that incorporating news bias information substantially impacts performance, improving the average F1 from .739 to .815. Finally, we perform a comprehensive case study on US-based news headlines, finding reporting differences between news agencies and cycling-specific websites as well as differences in reporting depending on the gender of cyclists. WARNING: This paper contains descriptions of accidents and death.
more » « less
Full Text Available
Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

Ford, James; Zhao, Xingmeng; Schumacher, Dan; Rios, Anthony (January 2025, Proceedings of the International Conference on Computational Linguistics)

We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable, or focus solely on data accuracy, neglecting the effectiveness of visual communication. By employing VQA models, we assess data representation quality and the general communicative clarity of charts. Experiments were conducted using two leading VQA benchmark datasets, ChartQA and PlotQA, with visualizations generated by OpenAI’s GPT-3.5 Turbo and Meta’s Llama 3.1 70B-Instruct models. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures. Moreover, while our results demonstrate that few-shot prompting significantly boosts the accuracy of chart generation, considerable progress remains to be made before LLMs can fully match the precision of human-generated graphs. This underscores the importance of our work, which expedites the research process by enabling rapid iteration without the need for human annotation, thus accelerating advancements in this field.
more » « less
Full Text Available
Translating Natural Language Specifications into Access Control Policies by Leveraging Large Language Models

https://doi.org/10.1109/TPS-ISA62245.2024.00048

Lawal, Sherifdeen; Zhao, Xingmeng; Rios, Anthony; Krishnan, Ram; Ferraiolo, David (October 2024, IEEE)

Full Text Available
UTSA-NLP at ChemoTimelines 2024: Evaluating Instruction-Tuned Language Models for Temporal Relation Extraction

Zhao, Xingmeng; Rios, Anthony (June 2024, Proceedings of the 6th Clinical Natural Language Processing Workshop)
A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

Zhao, Xingmeng; Niazi, Ali; Rios, Anthony (June 2024, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Full Text Available
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?

https://doi.org/10.18653/v1/2023.conll-babylm.16

Zhao, Xingmeng; Wang, Tongnian; Osborn, Sheri; Rios, Anthony (December 2023, Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning)

Full Text Available
UTSA-NLP at RadSum23: Multi-modal Retrieval-Based Chest X-Ray Report Summarization

https://doi.org/10.18653/v1/2023.bionlp-1.58

Wang, Tongnian; Zhao, Xingmeng; Rios, Anthony (July 2023, The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks)

Full Text Available
A marker-based neural network system for extracting social determinants of health

https://doi.org/10.1093/jamia/ocad041

Zhao, Xingmeng; Rios, Anthony (April 2023, Journal of the American Medical Informatics Association)

Abstract Objective The impact of social determinants of health (SDoH) on patients’ healthcare quality and the disparity is well known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to automatically extract SDoH information from clinical notes. Materials and Methods The study uses the N2C2 Shared Task data, which were collected from 2 sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for 12 SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes. Results Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared with the shared task methods. Our approach achieved an F1 of 0.9101, 0.8053, and 0.9025 for Subtasks A, B, and C, respectively. Conclusions The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue and further research is needed to improve the extraction of entities with complex semantic meanings and low-frequency entities. We have made the source code available at https://github.com/Zephyr1022/SDOH-N2C2-UTSA.
more » « less
Full Text Available
Turning Stocks into Memes: A Dataset for Understanding How Social Communities Can Drive Wall Street

https://doi.org/10.1609/icwsm.v16i1.19369

Alvarez, Richard; Bhatt, Paras; Zhao, Xingmeng; Rios, Anthony (June 2022, Proceedings of the International AAAI Conference on Web and Social Media)

Who actually expresses an intent to buy shares of GameStop Corporation (GME) on Reddit? What convinces people to buy stocks? Are people convinced to support a coordinated plan to adversely impact Wall Street investors? Existing literature on understanding intent has mainly relied on surveys and self-reporting; however there are limitations to these methodologies. Hence, in this paper, we develop an annotated dataset of communications centered on the GameStop phenomenon to analyze the subscriber intention behaviors within the r/WallStreetBets community to buy (or not buy) stocks. Likewise, we curate a dataset to better understand how intent interacts with a user's general support towards the coordinated actions of the community for GameStop. Overall, our dataset can provide insight to social scientists on the persuasive power of social movements online by adopting common language and narrative. WARNING: This paper contains offensive language that commonly appears on Reddit's r/WallStreetBets subreddit.
more » « less
Full Text Available

« Prev Next »

Search for: All records